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Abstract. In a recent paper, Haucnstein, Sturmfcls, and the second author 
discovered a conjectural bijection between critical points of the likelihood func- 
tion on the complex variety of matrices of rank r and critical points on the 
complex variety of matrices of co-rank r — 1. In this paper, we prove that 
conjecture for rectangular matrices and for symmetric matrices, as well as a 
variant for skew-symmetric matrices. 



1. Introduction and results 

For an m x n-data table U = (ity) € N mx ", we define the likelihood function 
In : T mxn — > T, where T = C* is the complex one-dimensional torus, as du(Y) — 
II,,//" for Y = {y l3 ) l0 e T mx ". This terminolo gy is motivated by the following 
observation. If Y is a matrix with positive real entries adding up to 1, interpreted 
as the joint probability distribution of two random variables taking values in [to] := 
{1, . . . , to} and [n] := {1, . . . , n}, respectively, then up to a multinomial coefficient 
depending only on U, tu(Y) 1Sl the probability that when independently drawing 
Si j u ij P a i rs from the distribution Y, the number of pairs equal to (i,j) is In 
other words, £u(Y) lS the likelihood of Y, given observations recorded in the table 
U. A standard problem in statistics is to maximize £u(Y). 

Without further constraints on Y this maximization problem is easy: it is 
uniquely solved by the matrix Y obtained by scaling U to lie in said probabil- 
ity simplex. But various meaningful statistical models require Y to lie in some 
subvariety X of T mx ™. For instance, in the model where the first and second ran- 
dom variable are required to be independent, one takes X equal to the intersection 
of the variety of matrices of rank 1 with the hyperplane Vij — 1 supporting the 
probability simplex. Taking mixtures of this model, one is also led to intersect said 
hyperplane with the variety of rank-r matrices. 

For general X, the maximum- likelihood estimate is typically much harder to 
find (though in the independence model it is still well- understood). One reason 
for this is that the restriction of l\j to X may have many critical points. Under 
suitable assumptions, this number of critical points is finite and independent of U 
(for sufficiently general U), and is called the maximum likelihood degree or ML- 
degree of X. Finiteness and independence of U holds, for instance, for smooth 
closed subvarieties of a torus |Huhl2) . but also for all varieties X studied in this 
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paper (HRS12| IHKS05] (which are smooth but not closed, and become closed but 
singular if one takes the closure) . 

We take X to be a smooth, irreducible, locally closed, complex subvariety of a 
torus. Doing so, we tacitly shift attention from the statistical motivation to complex 
geometry — in particular, we no longer worry whether the critical points counted by 
the ML-degree lie in the probability simplex or are even real-valued matrices. 

The set of all critical points for varying data matrices U has a beautiful geometric 
interpretation: Given P £ X and a vector V in the tangent space TpX to X at 
P, the derivative of tu at P in the direction V equals (u(P) ■ Yljj 7r~ u ij- 

This 

vanishes if and only if U is perpendicular, in the standard symmetric bilinear form 
on C mxn = C m ™, to the entry-wise quotient -p of V by P. This leads us to define 



CritpT) :={(P,U) | 



T X P 
P 



_L U} C X x C r ' 



which is called the variety of critical points of X in |Huhl2j . except that there U 
varies over projective space and the closure is taken. By construction, Crit(X) is 
smooth and irreducible, and has dimension mn; indeed, it is a vector bundle over 
X of rank mn — dimX. The ML-degree of X is well-defined if and only if the 
projection Crit(X) — > C mxn is dominant, in which case the degree of this rational 
map is the ML-degree of X. 

In this paper, motivated by [HRS12j . we consider three choices for X, all given 
by rank constraints: First, in the rectangular case, we order m, n such that m < n, 
fix a rank r € [m] , and take X equal to 

Mr := {P E T mxn | J^Pij = 1 and rkP = r}. 



Second, in the symmetric case, we take m = n and take X equal to 

Plrr 



SM r 



p 



2pu P12 

P12 2p 2 2 
Plm 



2Pr, 



G T mxm | ^2 PlJ = 1 and rk(P) 

i<j 



Third, in the skew- symmetric or alternating case, we take m = n and, for even 
r G [m] , take X equal to 



AM r ■= < P = 



P 12 
~Pl2 

-Plm 



Pit 







GC mxm | E^-Py = l,rk(P)=r, 
and Vz < j : p,j 7^ 



Minor modifications of the likelihood function are needed in the latter two cases: 
we define as £u(P) '■= Yli<jPij 3 m the symmetric case, and as £u(P) ■= Ili<jPij J 
in the alternating case. 

In [HRS12] . using the numerical algebraic geometry software bertini BHS W06] . 
the ML-degree of M r is computed for various values of r,m,n with r < m < n. 
The numbers are listed in Table [1] Observe that the numbers for rank r and rank 
m — r + 1 coincide. The natural conjecture put forward in that paper is that this 
always holds |HRS12| Conjecture 1.2], and that there is an explicit bijection be- 
tween the two sets of critical points HRS12, Conjecture 4.2]. Moreover, similar 
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(3,3) 
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(4,5) 


(4,6) 
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26 
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191 


843 


3119 


6776 
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843 


3119 


61326 
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Table 1. ML-degrees of M r for small values of r < m < n 



results were conjectured for symmetric matrices. We will prove these conjectures, 
for which we use the term ML-duality suggested to us by Sturmfels. 

Theorem 1 (ML-duality for rectangular matrices and for symmetric matrices). Fix 
a rank r G [m] and let U G pf" x ™ W nh m < n (m = n in the symmetric case) be a 
sufficiently general data matrix (symmetric in the symmetric case). Then there is 
an explicit involutive bisection between the critical points of Ijj on M r (respectively, 
SAi r ) and the critical points of Ijj on A4 m -r+i (respectively, SM m - r +i)- In 
particular, the ML-degrees of A4 r and M. m - r +i (respectively, SA4 r and SM. m _ r+1 ) 
coincide. Moreover, the product £u{P)^u(Q) * s the same for all pairs consisting of 
a rank-r critical point P and the corresponding rank-m — r + 1 point Q. 

In the alternating case, the ML-dual of AM r turns out not to be some AA4 S 
but rather an affine translate of a determinantal variety defined as follows. Let S 
be the skew m x m-matrix 

1 •••11 



S :-- 



-1 



•• 1 
-1 



and for even s <G {0, . . . , m — 1} consider the variety 

AM' S := {P G C mxm | P skew, Vi < j : p tf ^ 0, and rk(S - P) = a}. 

Note that, unlike in AM. ri the upper triangular entries of P G AM' S are not 
required to add up to 1. 

Theorem 2 (ML-duality for skew matrices). Fix an even rank r G {2, . . . , m} and 

let U G Bf" xm be a sufficiently general symmetric data matrix with zeroes on the 
diagonal. Let s G {0, . . . , m — 2} be the largest even integer less than or equal to 
m — r. Then there is an explicit involutive bijection between the critical points of 
l\j on AM. r and the critical points of Ijj on AM' S . In particular, the ML-degrees 
of AM r andAM' s coincide. Moreover, the product £jj(P)£u(Q) is the same for all 
pairs consisting of a rank-r critical point P on AM r and the corresponding ranks 
point Q on AAi' s . 

The proof is similar in each of the three cases. First, we determine the tangent 
space to X at a critical point P of £\j for sufficiently general U. It turns out that 
this space is spanned by certain rank-one or rank-two matrices. Imposing that P 
be a critical point, i.e., that the derivative of l\j vanishes in each of these low-rank 
directions leads to the conclusion that a certain matrix Q, determined from P using 
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some involution involving the fixed matrix U, has rank at most m — r + 1 (or s in 
the skew case) and is itself a critical point on the variety of matrices of its rank. 
Letting k < m — r + 1 (respectively, k < s) be generic rank of Qs thus obtained, 
we reverse the roles of P and Q to argue that k must equal s, thus establishing 
the result. In the remainder of this paper we fill in the details in each of the three 
cases, in particular making the involution P —¥ Q explicit. 

Acknowledgments 

We thank Bernd Sturmfels for his encouragement and comments on early versions 
of this paper. 

2. Maximum likelihood duality in the rectangular case 

Let m < n be natural numbers and let A4 r C T mx " denote the variety of 
m x ?i-matrices of rank r whose entries sum up to 1. Fix a sufficiently general data 
matrix U = (uy)y G N mx ", which gives rise to the likelihood function Ijj : A4 r — > 
T, £u(P) — Ili j Pij 3 ■ Let P G M r be a critical point for £(/, which means that the 
derivative of l\j vanishes on the tangent space TpM. r to A4 r at P. This tangent 
space equals 

(1) TpM r = {X = (a;y)ij G C mx " | XkerP C imP and J^ary = 0}. 

Here the first condition ensures that X is tangent at P to the variety of rank-r 
matrices (see, e.g., |Har92( Example 14.6]) and the second condition ensures that 
X is tangent to the hyperplane where the sum of all matrix entries is 1 . 

Given X G TpM r , the derivative of iu in that direction equals £u(P)'J2ij a 'p"' J , 
which vanishes if and only if the second factor vanishes. We will now prove that 
the marginals of P are proportional to those of U (see also [HRS121 Remark 4.6]). 
We write 1 for the all-one vectors in both C m and C n , and use self-explanatory 
notation such as Ui + :— J2j u ij an d u ++ := u ij- 

Lemma 3. The column vector PI is a non-zero scalar multiple ofUl and the row 
vector 1 T P is a non-zero scalar multiple of 1 T U . 

Proof. We prove the first statement; the second statement is proved similarly. We 
want to show that the 2 x 2- minors of the m x 2-matrix vanish. We give 

the argument for the upper minor. Let X = (xij) be the m x n-matrix whose first 
row equals p2+ times the first row of P, whose second row equals — pi+ times the 
second row of P, and all of whose other rows are zero. Then X G TpA4 r , so that 
the derivative V\. ■ xa — is zero. On the other hand, substituting X into Y] ... xa — 
yields ui + p 2 + — U2+P1+, hence this minor is zero as desired. The scalar multiple in 
both cases is = — =— , which is non-zero. □ 

Define Q — (qij)ij by Pijqij = Ui + UijU + j. This is going to be our dual critical 
point, up to a normalization factor that we determine now. 

Lemma 4. The sum ^2ijQij equals (u ++ ) 3 . 

Proof. By Lemma [3] the rank-one matrix Y defined by = Ui + u + j has image 
contained in imP. Hence it satisfies the linear condition YkerP C imP, but not 
the condition J^ijUij = 0- Similarly, P itself satisfies PkerP C imP, but not 
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Eij Pij = 0- Hence, we can decompose Y uniquely as cP + X where c £ C and 
where X satisfies Xkei P C imf and Ylu x ij = 0: i- e -: where X G TpA4 r . Then 
we have 

I>y = J2 = J2 CU ^ +J2 =^2c Uij + = cu ++ 

ij ij lJ ij ij y ij 



by criticality of P. The scalar c equals 



J2 l3 p-j 1 

which proves the lemma. □ 

We will use rank-one matrices in the tangent space TpM. r . We equip both C m 
and C n with their standard symmetric bilinear forms. 

Lemma 5. The tangent space TpA4 r at P is spanned by all rank-one matrices vw T 
satisfying the following two conditions: 

• v € im P or w_L ker P; and 

• w_Ll or iu_Ll. 

In the proof we will need that imP is not contained in the hyperplane 1^ and 
that, dually, kerP does not contain 1. These conditions will be satisfied by gener- 
icity of U . 

Proof. The first condition ensures that the rank-one matrices in the lemma map 
kerP into imP, and the second condition ensures that the sum of all entries of 
those rank-one matrices is zero, so that they lie in TpA4 r , see ([T]). To show that 
these rank-one matrices span the tangent space TpA4 r , decompose C m as A®B@ C 
where A® C = l 1 - and A © B = imP. Here we use that imP is not contained in 
the hyperplane 1^. 

Similarly, decompose C" = A' © B' © C where A 1 © C is the hyperplane l x 
and A' ®B' = (kerP)- 1 ; here we use the second genericity assumption on P. These 
spaces have the following dimensions: 

dim A = r — 1 dim B = 1 dim C = m — r 

dim A' = r — 1 dim B' = 1 dim C = n — r. 

The space spanned by the rank-one matrices in the lemma has the space (B © 
B') © (C © C) as a vector space complement. The dimension of this complement 
is 1 + (to — r)(n — r), which is also the codimension of M r . □ 

Let R = diag(zii+)i and K — diag^+j)^ be the diagonal matrices recording the 
row and column sums of U on their diagonals. Then, by Lemma [3J PI is a scalar 
multiple of PI and 1 T P is a scalar multiple of 1 T K. This implies that, in the 
decompositions in the proof of Lemma [SJ we may take B spanned by Ul — Rl and 
B' spanned by Ul — Kl. Note that P, Q satisfy P * Q — RUK, where * denotes 
the Hadamard product. 

Observe also that criticality of P is equivalent to v T R~ 1 QK~ 1 w = for all 
rank-one matrices vw T as in Lemma [S] This criterion will be used in the proof of 
our duality result for A4 r . 
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Theorem 6 (ML-duality for rectangular matrices). Let U £ pj mx ™ fr e a sufficiently 
general data matrix and let P be a critical point of Ijj on M. r . Define Q — (qij)ij 
by qijPij = Ui+u-ijU+j . Then Q/(u^_ + ) is a critical point of £u on M. m - r +\. 

Before proceeding with the proof, we point out that the construction of Q' := 
Q/(u++) 3 from P is symmetric in P and Q. As a consequence, the map P H> Q' 
from critical points of l\j on M. r to critical points on Ai m -r+i is a bijection. 
Moreover, it has the property that tu(P) '^u(Q') depends only on U. In particular, 
if one lists the critical points P £ Ai r with positive real entries in order of decreasing 
log-likelihood, then the corresponding Q' £ M. m -r+i appear in order of increasing 
log-likelihood, since the sum log£jj(P) + lagljj(Q') depends only on U. 

Proof. Lemma 3] takes care of the normalization factor, which we therefore ignore 
during most of this proof. We first show that Q has rank at most m — r + 1. For 
this we take arbitrary v in the space A = 1 HimP from the proof of Lemma [5] and 
arbitrary w £ C", so that vw £ TpMv From v T R- 1 QK~ 1 w = we conclude 
that R^ 1 im Q C A 1 - because v was arbitrary in A. Equivalently, since R is diagonal 
and hence symmetric, wc conclude that imQ C (Rr 1 A) A - = (R~ 1 A) ± . The latter 
space has dimension m — r + 1 , which is therefore an upper bound on the rank of 
Q. 

Similarly, for w £ A' and any v £ C m , the matrix vw T lies in the tangent space 
TpA4 r , and we find v T R^ 1 QK^ 1 w = 0. Since v was arbitrary, this means that 
QK~ lr u] = 0, so kerQ contains A' -1 /!', a space of dimension r — 1. If n > m, 
however, then by the above the kernel of Q strictly contains K^A'. 

Next we prove that for any rank-one matrix xy T such that 

• xLR^A or y±K~ 1 A'; and 

• X-Ll or y_Ll 

we have Y\- • XzUz: > yj = 0. Note that the conclusion can be written as x T R~ 1 PK~ 1 y = 
0, and observe the similarity with the characterization of TpA4 r in Lemma [5] that 
will give us conditions of criticality of Q. 

Given arbitrary y £ C™ we can write PK~ x y as v + cRl with v £ A. Then for 
x £ (R~ 1 A) ± perpendicular to 1 we find 

x T RT 1 PK- 1 y = x T R- 1 (v + cRl) = + cx T l = 0, 

as desired. If, on the other hand, x £ (R~ 1 A) ± is not perpendicular to 1 but 
y £ C" is, then writing w :— K~ 1 y we claim that v :— Pw lies in A. For this we 
compute the dot product 

l T Pw = l T Uw = l T Kw = l T y = 0, 

where the first equality is justified by Lemma [3l Hence, again, x T R~ 1 PK~ 1 y — 
x T R~ 1 v = 0. The checks for the case where y_LA _1 yl' are completely analogous. 

Now denote the rank of Q by k, so that k < m—r+1. From imQ C (R~ 1 A) ± and 
(kerQ)^ C (K^^-A') 1 - we conclude that the derivative of ijj at Q' in the direction 
xy T vanishes, in particular, when xy T lies in the tangent space at Q' to A4k- Hence 
Q' is a critical point for l\j on Mk- 

Finally, we need to show that the generic rank k of Q thus obtained (from a 
sufficiently general U and a critical point P £ M r of Ijj) equals m — r + 1, rather 
than being strictly smaller. For this, observe that we have constructed, for any 
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r £ [to] , a rational map of irreducible varieties 
ipr ■ Crit(A"f r ) — > Crit(M /(r) ), (P, [/) h-> ( 



("++) 3 



fit/if 
— P~ : 



U) = (Q',U) 



where / : [to] — > [to] maps r to the generic rank of the matrix Q' as (P, [/) varies 
over Crit(Al r ). Since ip r commutes with the projection on the second factor, its 
image has dimension mn, hence tp r is dominant. But it is also injective — in fact, 
(P, U) can be recovered from (Q', U) with the exact same formula. This shows that 
ip r is birational, and that ipfM is its inverse as a birational map. In particular, 
/(/(r)) = r, so that / is a bijection. But the only bijection [to] — > [to] with the 
property that f(r) < m — r + 1 for all r is r t— > in — r + 1. Indeed, if r were the 
smallest value for which f(r) ^ m — r + 1, then m — r+l would not be in the image 
of /. This concludes the proof of the theorem. □ 



Remark 7. It can happen that the rank of Q is strictly smaller than to — r 
but the proof above shows that for sufficiently general U this does not happen, 
example, in the rectangular case where to = n = 4, if we have that 

G 

5 - 



+ 1 
For 



2 
4 
2 
2 



and P = — 
80 



5- 
4 



-2i 
- 2i 



V5 
- 2i 
-2i 
75 



5 + V5 
4 + 2i 
6-2i 
5- VE 



4-2i 
5 + \^ 

5 - VE 

6 + 2i 



then that there exist ML-degree many points in Crit(A^2) with this choice of U, 
and it can be shown (P, U) £ Crit(A^2) is one such point. Because u ++ —l we have 
Q = Q', and 



Q 



l 

500 



6 

5H 
5- 

4- 



- 2% 
-2i 



5- 

6 

4 

5- 



V5 
-2i 
- 2% 

s/5 



5-V5 
4- 2% 
6 + 2i 
5 + \/5 



4 + 2i 

5 - V5 
5 + V5 
6-2i 



satisfies p^gy 



In this case, Q has rank 2 instead of rank 3. This is 



u i+ u + + u +j 

^++ . 

an important fact for numerical computations. If we were to to use the homotopy 
methods as in HRS12 to find the critical points of Ijj on M3, we would track a 
path from a generic point of Crit(A^3) to the point (Q, U). Since Q has rank less 
than 3, this will correspond to tracking a path to a singularity leading to numerical 
difficulties. But by determining all critical points of Ijj on JM2, we avoid these 
numerical difficulties. To determine the points of Crit(A^3) with U as above, we 



use the equation Vijlij 



and determine which (qij) have rank 3. 



3. Maximum likelihood duality in the symmetric case 



Let to be a natural number and let SAA r denote the variety of symmetric to x m- 
matrices of rank r whose entries sum to 2. A point P of SA4 r and data matrix U 
will be denoted by 



P = 



2p u 
P12 



P12 

2^22 



Pi, 



and U = 



2un 

U%2 



"12 

2u 22 



Pll 



2p n 



Ui r 



2u. n 
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We denote the (i, j)-entries of P and U by Pij and Uij to distinguish them from 
the and Uij , respectively. Recall that the likelihood function in the symmetric 
case is defined as £u{P) '■— YliKjPij't which in terms of the entries of P equals 
(rii<j P^/ 3 ) ' iWiiPa I ^) Uii ) ■ From now on we fix a sufficiently general data matrix 
U and a critical point P for £u on SftA r . The tangent space TpSM. r equals 

(2) TpSM r = {X e C mxm symmetric | XkerP C imP and = 0}. 



Given a tangent vector X € TpSM. r , the derivative of tjj in that direction equals 



\ XjjUjj 



(Xii/2)uu 



Pij Pa /2 



v ^ XijUij 



i<j 



(up to a factor irrelevant for its vanishing). We set 

U l+ := ]T and U++ := £ ^ [/;. 



and similarly for P. The symmetric analogue of Lemma [3] is the following. 

Lemma 8. The vector PI is a non-zero scalar multiple ofUl. 

Proof. We need to prove that the m x 2-matrix (Pl|f/1) has 2 x 2-minors equal to 
zero. We prove this for the minor in the first two rows. Set a := Pi + and b := P2+, 
and define V\, v% £ C m by v\ — (6, 0, 0, ... , 0) T , V2 — (0, a, 0, ... , 0). Let w\,W2 be 
the first and second column of P, respectively. Then for each i = 1,2 the matrix 



ViUlf 



WiVj lies in the tangent space at P to the variety of symmetric 



rank-r matrices, and the difference X := X^ — X^ has sum of entries equal to 
and therefore lies in TpSA4 r . The symmetric matrix X looks like 



26Pn 

* 



(b - a)P 12 
2aP 22 



6P13 • • • 
-aP 23 ■ • • 




The derivative of £jj at P in the direction X equals 

hiUij _ _ a ^ 



bP lm 
—aP 2m 




i<j 



Pi ; 



2+, 



and this derivative vanishes by criticality of P. The relevant non-zero scalar mul- 



tiple is jj^- = rr— , which is non-zero. 



□ 



The analogue of R, K from the rectangular case is R := diag (£7i+, • . . , U m +) 
and K := diag (U+i, . . . , U+ m ). Note R = K because U is symmetric, but we keep 
this notation to mirror the rectangular case. As in the rectangular case, define the 
symmetric matrix Q by P*Q = RUR, i.e., PijQij 



Ui + UijUj + for i,j G [m]. This 



will be our dual critical point, up to a normalizing factor to be determined now. 



Lemma 9. The sum y\, Qij equals 
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Proof. By Lemma [8] the rank-one matrix Y with entries Y%j = Ui+Uj+ has image 
contained in im P, and so does P. So we can decompose Y = cP + X with c € C 
and X G TpSAi r , and we find 

E^E^ = E c ^ + E^ = cC/ +++° = cC/ ++- 

Y ITT \2 tTJ \3 

Moreover, the scalar c equals -p^- = ^ + +' , which shows that Q++ = + 2 ■ ^ 

As in the rectangular case, we will make use of low-rank elements in TpSM r , 
where now "low rank" means rank two. 

Lemma 10. The tangent space TpSM r is spanned by all matrices of the form 
vw T + w T v with v £ im(P) and w € C m , with the additional constraint that the 
sum of all entries is zero, i.e., that one of v and w is perpendicular to 1. 

In the proof we will implicitly use that imP is not contained in 1 , which is 
true by genericity of U. 

Proof. The proof is similar to that of Lemma [SJ First, the matrices in the lemma 
satisfy the conditions characterizing TpSAi r ; see Second, to show that they 
span that tangent space, split C m as A®B®C with A®B = imP and A®C = l x , 
so that the second symmetric power S 2 C m equals 

s 2 (A) © s 2 (b) © s 2 (c) © (A <g> b) © (A ® C) © (b ® c). 

The matrices in the lemma span S 2 (A) + A <g) B + (A © B) £g) C. This space has 
dimension f^) + (r — l) + r(n — r), which equals ( r ^ : ) +r(n — r) — 1 = dim<S.M r . □ 

By Lemma [ini it suffices to understand the derivative Yl%<j X p"' J for X equal 
to vw T + wv T , in which case it equals 



2ttn U\2 Ulr, 

Pll Pl2 ' ' ' Pin 

U 12 2u 2 2 

P12 P22 



Pi™ P™ 



The right-hand side can be concisely written as v T (j;)w, where ^ is the Hadamard 



(element-wise) quotient of U by P. So criticality of P is equivalent to the statement 
that v T (j;)w vanishes for all v,w as in Lemma 1101 This, in turn, is equivalent to 
the condition that v T R~ 1 QR~ 1 w = for all v,w as in Lemma [TOl We now state 
and prove our duality result in the symmetric case. 

Theorem 11 (ML-duality for symmetric matrices). LetU € f$ mxm fr e a sufficiently 
general symmetric data matrix, and let P be a critical point of t\j on SM. r . Define 

the matrix Q by PijQij = U~i+UijUj+ . Then 4Q / ( h ) 3 * s a critical point of lu 

on SM m ~ r +i. 

As in the rectangular case, the map P n- Q' ;= 4Q/(J7 ++ ) 3 is a bijection by 
virtue of the symmetry in P and Q, and the same conclusions for the cricital points 
with positive real entries can be drawn as in the rectangular case. 
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Proof. The normalizing factor was dealt with in Lemma|9]and will be largely ignored 
in what follows. As in the proof of Lemma ITOl decompose C™ 1 as A © B © C with 
A © B = imP and A © C — 1^. So A has dimension r — 1, C has dimension m — r, 
and _B has dimension 1. We take B to be spanned by PI, which is a non-zero scalar 
multiple of Rl by Lemma [TOl 

First we bound the rank of Q. To do so we prove that the image of Q is 
contained a space of dimension m — r + 1. Indeed, by criticality of P we have 
v T R~ 1 QK^ 1 w = for w G C' m , w £ imi 5 such that v JL 1 or iy _L 1. Taking ui 
arbitrary and i> in A, we find that imQ C (R~ 1 A)- L , which has dimension m — r+1. 

Next we show that 

x T RT 1 PK- 1 y = 

for any a; £ (i?" 1 ^)- 1 and y G C m with x JL 1 or y _L 1. First, suppose x_Ll. Since 
PK~ 1 y may be written as a + ci?l with a e A and scalar c, we find 



x T R- 1 PK- 1 y = x T R^ 1 a 



cx T R~ 1 Rl = x T R~ 1 a 



= 0. 



Otherwise, we have yJ_l and we may assume x = cRl with c a scalar. In this case, 
we have 

x T R- 1 PR- 1 y = c\ T PK- x y = c\ T KK~ 1 y = ly = 0, 

where we use Lemma |8] 

Let k be the rank of Q. Since imQ C (R~ 1 A) ± we conclude that x T R~ 1 PK~ 1 y = 
holds, in particular, for all matrices xy T + yx T spanning the tangent space to 
SAik at Q' , so that Q' is critical. By reversing the roles of P and Q and using the 
involution argument at the end of the proof of Theorem |6l we conclude that for 
generic U the value of k equals m — r + 1 (rather than being strictly smaller). This 
proves the theorem. □ 



4. Duality in the skew-symmetric case 

The skew-symmetric case, while perhaps not of direct relevance to statistics, is of 
considerable algebro-geometric interest |HKS05j . since the variety AM. Tl consisting 
of skew-symmetric matrices of even rank r whose upper-triangular entries are non- 
zero and add up to 1, is (an open subset of a hyperplane section of the affinc 
cone over) a secant variety of the Grassmannian of 2-spaces in C m . Recall that we 
want to prove that AM r (the intersection of a determinantal variety with an affine 
hyperplane) is ML-dual to the affine translate AM.' S of a determinantal variety. 

A point P of AM. r and data matrix U will be denoted by 



P = 





-Pl2 
-Plm 



Pl2 




Pi, 







and U = 





U m l 



U\2 








Note that U is symmetric rather than alternating. We fix a sufficiently general data 
matrix U and a critical point P for Ijj on AA4 r - The tangent space TpAM. r equals 



(3) 



TpAMr — {X G C mxm skew | AkerP C imP and 



i<j 



0}. 



The derivative of £jj at P in the direction X equals 



; U P to a factor 

irrelevant for its vanishing. The following lemma is the skew analogue of Lemmas [3] 
and E 



ML-DUALITY 
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Lemma 12. The vector a = {^j^Pji + ^2j>iPij)i * s a scalar multiple of Ul 



3 >li 



Proof. We need to show that 2 x 2-minors of the matrix (a\Ul) are zero, and do 
so for the first minor. Let v\, v 2 be the first and second column of P, respectively, 
and set w\ := {a 2 , 0, . . . , 0) and w 2 := (0, — ai, 0, . . . , 0). Then each of the matrices 
Viiuf — Wivf is tangent at P to the variety of skew-symmetric rank-r matrices, and 
their sum 



X 



(a 2 - ai)p 12 

-(02-ai)Pl2 

-a 2 pi 3 axp 2 3 



0-2P13 

-aip 2 3 








a-iPxm 

~aiP2n 









-a 2 pi m aip 2m 

has upper-triangular entries adding up to 0, so that X is tangent at P to AAi r - 
The derivative of iu at P in the direction X, which is zero by criticality of P, equals 

(a 2 - Oi)tti2 + a 2 ui 3 + . . . + a 2 u lm - aiu 23 - ... - a\p 2m = a 2 u 1+ - a\u 2+ , 

which is the minor whose vanishing was required. □ 

Next we determine rank- two elements spanning TpAM r - For this we introduce 
the skew bilinear form (., .) on C m defined by (v, w) = v T Sw — J2i<j( v i w j ~~ v j w i)i 
where S is the skew-symmetric matrix 

" 1 • • • 11 



-1 



-1 



-1 



from the introduction. By elementary linear algebra, this form is non-degenerate if 
m is even and has a one-dimensional radical spanned by (1,— 1,1,— 1,...,1) e C m 
if m is odd. 

In what follows, it will be convenient to think of skew-symmetric matrices also 
as elements of /\ 2 C rn or as alternating tensors. 

Lemma 13. The tangent space TpAM. r is spanned by skew- symmetric matrices 
of the form vw T — wv T with » £ imP and (v,w) = 0. 

In the proof we will use that imP is non-degenerate with respect to (., .). This 
condition will be satisfied for general U. 

Proof. The proof is similar to the symmetric case and the rectangular case: a skew- 
symmetric matrix X lies in the tangent space if and only if XkerP C imP and 
J2i<j x ij = 0- The condition ueimP ensures the first property and the condition 
that (v,w) =0 ensures the second property. 

To complete the proof, decompose C m as A® C with A = imP and (A, C) = 0, 
so that /\ C m decomposes as f\ A © (A ® C) © /\ C. Taking the vector w in 
v T w — wv T from C we see that A ® C is contained in the span of the matrices 
in the lemma. Next we argue that a codimension-one subspace of /\ 2 A is also 
contained in their span. Indeed, the (non-zero) tensors v T w — wv T G /\ A with 
v, w e A perpendicular with respect to (., .) form a single orbit under the symplectic 
group Sp(.A) = Sp r (recall that r is even, so that this is a reductive group), and 
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hence their span is an Sp(^)-submodule of /\ 2 A. But /\ 2 A splits as a direct sum 
of only two irreducible modules under Sp(^4): a one-dimensional trivial module 
corresponding to (the restriction of) (., .) and a codimension-one module. Hence 
the tensors v T w — wv T must span that codimension-one module. 

Summarizing, we find that the matrices in the lemma span a space of dimension 
r(n — r) + (I) — 1, which equals dim A\M r . □ 

Recall that in the alternating case the likelihood function is given by iu{P) = 
Ili<j Pij J ■ The derivative of this expression in the direction of a skew-symmetric 
matrix X of the form vw T — wv T equals (up to a factor irrelevant for its vanishing) 



Xij — = > — (ViWj - VjW l ) = v 
Pa Pa 



Hia 

P12 

.au o 

P12 



Ulr, 







L Pirn Pm-l,m 

Define the skew matrix Q by P * Q = U. Then criticality of P translates into 
v T Qw = for all v € imP and w G C m with (v, w) = 0. 

Theorem 14 (ML-duality for skew matrices). Let U — (Ujj)ij ^ e a sufficiently 
general symmetric data matrix with zeroes on the diagonal, and let P be a critical 
point of ijj on AAA r , where r £ {2, . . . , m} is even. Let s £ {0, . . . , m — 2} be the 
largest even integer less than or equal to m — r. Define the matrix Q by P*Q = U. 
Then the skew matrix Q' := 2Q/U ++ is a critical point of £jj on the translated 
determinantal variety AM.' S . Moreover, the map P — > Q' is a bijection between the 
critical points of £jj on AA4 r and those AA4' S . 

As in the rectangular and symmetric cases, the bijection P — > Q' maps real, 
positive critical points to real, positive critical points in such a way that the sum 
of the log-likelihoods of P and Q 1 is constant. 

Proof. By construction of Q we have v T Qw = for all v £ imP and w £ C m with 
v T Sw = 0. This means that the quadratic form (v, w) i— > v T Qw on imP x C m is a 
scalar multiple of the quadratic form (v,w) i— > v T Sw, denoted (., .) earlier, on that 
same space. The scalar is computed by computing 

(0,-p 12 ,...,-p lm )Q(l,0,...,0) T -(7 1+ 

and 

(0, -pi2, ■ ■ ■ , -pim)S(l, 0, . . . , 0) T = P 1+ = at, 
where a is the vector of Lemma [12J Using that lemma and the fact that J^i a i — 2 
we find that a% = 2U\+/U++. We conclude that the skew bilinear form associated 
to B := S — jj^Q is identically zero on imP x C m , hence keri? contains imP and 

imB = (ker B)- 1 (where _L refers to the standard bilinear form on C m ) is contained 
in kerP = (imF) 1 . In particular, B has rank at most s; let k < s denote the 
actual rank of B. 

Next we argue that Q' := jf^Q is critical for Ijj on AM.' k . By arguments similar 
to (but easier than) those in Lemma 1X31 the tangent space TqiAM!^ is spanned by 
rank-two matrices vw T — wv T with v £ imB and w £ C m arbitrary. Thus proving 
that Q' is critical boils down to proving that v T Pw = for all i; £ imB and 
w £ C m . But this is immediate from imBC ker P. Thus Q' is critical. 
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Furthermore, we need to show that (for generic 17) the rank k of B = S — Q' is 
equal to s rather than strictly smaller, and that the map P i— > Q' ', which is clearly 
injective, is also surjective on the set of critical points for ijj on AM! S . For these 
purposes we reverse the arguments above: assume that Q' is a critical point on 
AA4' k , where k is an even integer in the range {0, . . . , m — 2}. Define Q :— —w^Q' 
and define P by P * Q = U. Also, define B := S—Q'. Then criticality of Q' implies 
that v T Pw = for all u e imB and w £ C m , and this implies that kerP D imB. 
Thus I := rkP is at most m — k. 

Moreover, B itself lies in the tangent space Tq/ AM J., and criticality of Q' implies 
that X^i<j Bij^j 2 - — 0. Substituting the expression for B into this we find that 



= £(! 



*<3 



~Qij) 



Ui 



i<j 



(Pi: 



t'4 



i.e., the upper-triangular entries of P add up to one. We conclude that P lies in 
AM.I- Next, we argue that P is critical. Indeed, for v £ imP and w € C m such 
that (u, u;) = (v T Sw =)0 we find 



Qw = v {—t-(S- 



B))w = -^-{v T Sw - v T Bw) =0 + = 0, 



2 v " 2 
where we have used that imPC kerP. 

Summarizing, we have found rational maps 

ip r : Crit(AM r ) ~+ CTit(AM' fir) ), (P, U) i ^ ( 
4i : Grit(AM' fe ) — » Crit(AM s(fe) ), (Q', [/) ^ ( 
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2 



,f7) = and 



L'4 



1-^ 



for some map / mapping even integers r € {2, . . . , m} to even integers k € {0, . . . , m— 
2}, and some map g in the opposite direction. By the argument in the proof of 
Theorem[51 both ip r and ip' k are birational and g{f{r)) = r. Hence / is a bijection, 
and by the above it satisfies f(r) < m — r. The only such bijection is the one that 
maps r to the largest even integer less than or equal to m — r. This concludes the 
proof of the theorem. □ 

Example 15. Now we give an explicit example illustrating dual solutions in the 
alternating case. When m — 4 the ML-degree of AM 2 is 4 HKS05 . When 



U 



1 

4i 



2 3 5 

2 7 11 

3 7 13 
5 11 13 



and P 





-0.0386 
-0.0978 
-0.1075 



0.0386 


-0.1563 
-0.2929 



0.0978 0.1075 

0.1563 0.2929 

0.3069 
-0.3069 



we have P is a critical point of Ijj on AM.n and U++ = 2. Having Q defined as 
P * Q = U, we find that Q(— Q') has full rank. But in the alternating case the 
ML-dual variety is an affine translate of a determinantal variety. We find that 
B = S — Q equals 



B 



-0.2638 
0.2638 

-0.2518 0.0924 

0.1344 -0.0841 



0.2518 -0.1344 

-0.0924 0.0841 

-0.0332 
0.0332 
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and indeed B has rank 4 — 2 = 2. We can actually compute the ML-degree of 
AM 2 symbolically to be 4 (even with the iiy treated as symbols). For the data 
matrix U above, the minimal polynomial for (734 equals 434217q3 4 — 1335767<7| 4 + 
1536717q| 4 - 764049g 3 4 + 127426. 

5. Conclusion 

We have proved that a number of natural determinantal varieties of matrices are 
ML-dual to other such varieties living in the same ambient spaces. However, we 
have done so without formalizing what exactly we mean by ML-duality. It would be 
interesting to find a satisfactory general definition, perhaps involving the condition 
that (P, U) H> (!p,U), or some variant of this that takes marginals into account, is a 
birational map between the two varieties of critical points. Given such a definition, 
it would be great to discover new ML-dual pairs of varieties, for instance so-called 
subspace varieties |IW07] or varieties of consisting of tensors of given (border) rank. 
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